Dataset (machine learning)

A collection of raw data, commonly (but not exclusively) organized in one of the following formats:1

  • a spreadsheet
  • a file in CSV format

Chacteristics

A dataset is characterized by its size and diversity. Good datasets are both large and highly diverse:2

  • Size indicates the number of examples.
  • Diversity indicates the range those examples cover.

See also

Footnotes

  1. developers.google.com/machine-learning/glossary#dataset

  2. https://developers.google.com/machine-learning/intro-to-ml/supervised

2024 © ak